definition.tex (21533B)
1 \section{Task Definitions} 2 \label{sec:relation extraction:definition} 3 The relation extraction task was shaped by several datasets with different goals. 4 The first \textsc{muc}s focused on detecting naval sightings and engagement in military messages. 5 Subsequent conferences moved towards the extraction of business-related relations in news reports. 6 Nowadays, general encyclopedic knowledge is usually extracted from either news reports or encyclopedia pages. 7 Another common goal is to extract drugs, chemical and symptoms interactions in biomedical texts~\parencite{biobert}. 8 For further details, Appendix~\ref{chap:datasets} contains a list of datasets with information about the source of the text and the nature of the relations to be extracted. 9 Depending on the end-goal for which relation extraction is used, different definitions of the task might be more fitting. 10 We now formally define the relation extraction task and explore its popular variants. 11 12 \begin{marginparagraph} 13 For ease of notation, we changed the placement of entities in the tuple corresponding to a fact from the one used in Section~\ref{sec:context:knowledge base}. 14 This will allow us to refer to the entity pair as \(\vctr{e}\in\entitySet^2\). 15 \end{marginparagraph} 16 In relation extraction, we assume that information can be represented as a knowledge base \(\kbSet\subseteq\entitySet^2\times\relationSet\) as defined in Section~\ref{sec:context:knowledge base}. 17 In addition to the set of entities \(\entitySet\) and the set of relations \(\relationSet\), we need to define the source of information from which to extract relations. 18 The information source can come in several different forms, but we use a single basic definition on sentences which we can refine later on. 19 We assume entity chunking was performed on our input data. 20 We only deal with binary relations% 21 \sidenote[][-1cm]{ 22 As described in Section~\ref{sec:context:relation algebra}, this means that only relations between two entities are considered. 23 Moreover, higher-arity relations can be decomposed into sets of binary ones. 24 } 25 since they are the ones commonly encoded in knowledge bases. 26 We can therefore define \(\sentenceSet\) as a set of sentences with two tagged and ordered entities: 27 \begin{align*} 28 \sentenceSet = \{ & \text{``\uhead{Jan Kasl} became mayor of \utail{Prague}.''},\\ 29 & \text{``\utail{Vincent Callebaut} was born in 1977 in \uhead{Belgium}.''},\\ 30 & \dotsc\}. 31 \end{align*} 32 \begin{marginparagraph} 33 Relation extraction can also be performed on semi-structured documents, such as a Wikipedia page with its infobox or an \textsc{html} page that might contain lists and tables. 34 This is the case of \textsc{dipre} presented in Section~\ref{sec:relation extraction:dipre}. 35 As long as the semi-structured data can be represented as a token list, and standard text models can still be applied. 36 \end{marginparagraph} 37 In this example, two sentences are given; in each sentence, the relation we seek is the one between the two entities marked by underlines. 38 The entities need to be ordered since most relations are asymmetric (\(r\neq\breve{r}\)). 39 In practice, this means that one entity is tagged as \(e_1\) and the other as \(e_2\). 40 The standard setting is to work on sentences; this can of course be generalized to larger chunks of text if needed. 41 42 The tagged entities inside the sentences of \(\sentenceSet\) are not the same as entities in knowledge bases. 43 They are merely surface forms. 44 These surface forms are not sensu stricto elements of \(\entitySet\). 45 Indeed, the same entity can have several different surface forms, and the same surface form can be linked to several different entities depending on context. 46 To map these tagged surface forms to \(\entitySet\), entity linking is usually performed on the corpus. 47 In practice, this means that we consider samples from \(\sentenceSet\times\entitySet\times\entitySet\). 48 Finally, since the two tagged entities are ordered, we simply assume that the first entity in the tuple corresponds to the entity tagged \(e_1\) in the sentence, while the second entity refers to \(e_2\).% 49 \sidenote{Note that \(e_2\) can appears before \(e_1\) in the sentence.} 50 If entity linking is not performed on the dataset, we can simply assume that the surface forms are actually entities, in this case, and in this case alone, \(\entitySet\) is a set of surface forms. 51 This is somewhat uncommon, the standard practice being to have linked entities. 52 53 Also, note that this setup is still valid for sentences with three or more entities, as we can consider all possible entity pairs: 54 \begin{align*} 55 \sentenceSet = \{ & \parbox[t]{10cm}{``\uhead{Alonzo Church} was born on June 14, 1903, in \utail{Washington, D.C.}, where his father, Samuel Robbins Church, was the judge of the Municipal Court for the District of Columbia.'',}\\ 56 & \parbox[t]{10cm}{``\utail{Alonzo Church} was born on June 14, 1903, in Washington, D.C., where his father, \uhead{Samuel Robbins Church}, was the judge of the Municipal Court for the District of Columbia.'',}\\ 57 & \dotsc\}. 58 \end{align*} 59 In this example, we give two elements from \(\sentenceSet\), these elements are different since their markings \(\uent{\quad}\) differ. 60 We often use the word sentence without qualifications to refer to elements from \(\sentenceSet\). 61 Still, even though the two sentences above are the same in the familiar sense of the term, they are different in our definition. 62 63 Now, given a sentence with two tagged, ordered, and linked entities, we can state the goal of relation extraction as finding the semantic relation linking the two entities as conveyed by the sentence. 64 Since the set of possible relations is designated by \(\relationSet\), we can sum up the relation extraction task as finding a mapping taking the form: 65 \begin{equation} 66 \boxed{ 67 f_\text{sentential}\colon \sentenceSet\times\entitySet^2 \to \relationSet 68 } 69 \label{eq:relation extraction:sentential definition} 70 \end{equation} 71 72 When we have access to a supervised dataset, all the information (head entity, relation, tail entity, conveying sentence) is provided. 73 Table~\ref{tab:relation extraction:supervised samples} gives some supervised samples examples. 74 We denote a dataset of sentences with tagged, ordered, and linked entities as \(\dataSet\subseteq\sentenceSet\times\entitySet^2\) and a supervised dataset as \(\dataSet_\relationSet\subseteq\dataSet\times\relationSet\). 75 Given an entity pair \(\vctr{e}=(e_1, e_2)\), a sample in which these entities appear \((s, e_1, e_2)\) is called a \emph{mention}. 76 A sample which convey a fact \tripletHolds{e_1}{r}{e_2} is called an \emph{instance} of \(r\). 77 \begin{marginparagraph} 78 Mentions as defined here can be called ``entity mentions,'' while instances may be referred to as ``relation mentions.'' 79 \end{marginparagraph} 80 81 \begin{table} 82 \input{mainmatter/relation extraction/supervised samples.tex} 83 \scaption[Example of supervised samples from the FewRel dataset]{ 84 Samples from the FewRel dataset. 85 The surface forms in the head, relation and tail columns are only given for ease of reading and are usually not provided. 86 \label{tab:relation extraction:supervised samples} 87 } 88 \end{table} 89 90 The relation extraction task as stated by Equation~\ref{eq:relation extraction:sentential definition} is called \emph{sentential extraction}. 91 It is the traditional relation extraction setup, the sentences are considered one by one, and a relation is predicted for each sentence separately. 92 However, information can be leveraged from the regularities of the dataset itself. 93 Indeed, some facts can be repeated in multiple sentences, in which case a model could enforce some kind of consistency on its predictions. 94 Even beyond a simple consistency of the relations predicted, in the same fashion that a word can be defined by its context, so can an entity. 95 This kind of regularities can be exploited by modeling a dependency between samples even when conditioned on the model parameters. 96 While tackling relation extraction at the sentence level might be sufficient for some datasets, others might benefit from larger context, especially when the end goal is to build a knowledge base containing general facts. 97 This gives rise to the \emph{aggregate extraction} setting, in which a set of tagged sentences is directly mapped to a set of facts without a direct correspondence between individual sentences and individual facts. 98 \begin{marginparagraph} 99 The left-hand side of Equation~\ref{eq:relation extraction:aggregate definition} is a subset of \(\sentenceSet\times\entitySet^2\), that is \(\dataSet\) or a subset thereof. 100 On the right-hand side, we have a subset of \(\entitySet^2\times\relationSet\); we tintend to find \(\kbSet\) or a subset thereof. 101 However, each individual sample \((s, \vctr{e})\in\dataSet\) does not need to be mapped to an individual fact \((\vctr{e}, r)\in\kbSet\). 102 \end{marginparagraph} 103 \begin{equation} 104 \boxed{ 105 f_\text{aggregate}\colon 2^{\sentenceSet\times\entitySet^2} \to 2^{\entitySet^2\times\relationSet} 106 } 107 \label{eq:relation extraction:aggregate definition} 108 \end{equation} 109 Quite often in this case, the problem is tackled at the level of entity pairs, meaning that instead of making a prediction from a sample in \(\sentenceSet\times\entitySet^2\), the prediction is made from \(2^\sentenceSet\times\entitySet^2\). 110 This setup is required for multi-instance approaches presented in Section~\ref{sec:relation extraction:miml}. 111 Aggregate extraction may impose a relatively more transductive approach% 112 \sidenote{ 113 Transductive approaches are contrasted to inductive approaches. 114 In the inductive approach---such as neural networks---parameters \(\vctr{\theta}\) are estimated from the training set. 115 When labeling on an unknown sample, the model makes its prediction only from parameters \(\vctr{\theta}\) and the unlabeled sample, access to the training set is no longer necessary. 116 This is called induction since ``rules'' (\(\vctr{\theta}\)) are obtained from examples. 117 On the other hand,% 118 \unskip\parfillskip 0pt% XXX Too lazy to properly handle sidenote page break 119 } 120 since predictions rely directly on previously observed samples. 121 Usually, aggregate models still extract some form of prediction at the sentence level, even if they do not need to. 122 Therefore, the key point of aggregate approaches is the explicit handling of dataset-level information. 123 Some models may heavily depend on this global information, to the point that they cannot be trained without some form of repetition in the dataset. 124 The sentential--aggregate distinction constitutes a spectrum. 125 While all unsupervised methods exhibit some aggregate traits, they do not necessarily exploit as much structural information as they could; this is the key point of Chapter~\ref{chap:graph}. 126 127 \subsection{Nature of Relations} 128 \begin{marginparagraph}[-9mm]% XXX Remainder of the Transductive approaches \sidenote 129 in the transductive approach--such as \textsc{k-nn}---observations on the train set are directly transferred to test samples without first generalizing to a set of rules. 130 \end{marginparagraph} 131 The supervised relation extraction task described above is quite generic. 132 The approaches to tackle it in practice vary quite a lot depending on the specific nature of the facts we seek to extract and the corpus structure. 133 In this subsection, we present some variations on the nature of \(\relationSet\) commonly encountered in the literature. 134 135 \subsubsection{Unspecified Relation: \textsl{Other}} 136 \label{sec:relation extraction:other} 137 The set \(\relationSet\) is built using a finite set of labels. 138 These labels do not describe the relationship between all entities in all possible sentences. 139 Indeed some entities are deemed unrelated in some sentences. 140 A distinction is sometimes made between relation extraction and relation detection, depending on whether a relation is assumed to exist between the two entities in a sentence or not. 141 This apparent absence of relation is often called ``\textsl{other},'' since a relation between the two entities might exist but is simply not present in the relation schema considered~\parencitex{semeval2010task8}. 142 In this case, we can still use the usual relation extraction setup by augmenting \(\relationSet\) with the following relation: 143 \begin{marginparagraph} 144 We use the notation of Section~\ref{sec:context:relation algebra} where \(\bar{r}\) refers to the complementary relation of the named relations \(r\) in the schema \(\relationSet\). 145 Note that using the definition of relations as a set of entity pairs is not strictly correct here since two entities may be linked by a relation that is simply not conveyed by a specific sentence containing them. 146 The underlying problem to this notational conundrum is the fact that \textsl{other} is only needed for mono-relation extraction when one and exactly one relation must be predicted for a sample; see Section~\ref{sec:relation extraction:miml} for an alternative. 147 The definition given in Equation~\ref{eq:relation extraction:other} is nonetheless fitting to the widespread distant supervision setting which we describe Section~\ref{sec:relation extraction:distant supervision}. 148 \end{marginparagraph} 149 \begin{equation} 150 \textsl{other} = \bigcap_{r\in\relationSet} \bar{r}. 151 \label{eq:relation extraction:other} 152 \end{equation} 153 However note that ``\textsl{other}'' is not a relation like the others, it is defined by what it is not instead of being defined by what it is. 154 This peculiarity calls for special care on how it is handled, especially during evaluation. 155 156 \subsubsection{Closed-domain Assumption} 157 \label{sec:relation extraction:domain restriction} 158 As stated above, the set \(\relationSet\) is usually built from a finite set of labels such as \textsl{parent of} and \textsl{part of}. 159 This is referred to as the \emph{closed-domain assumption}. 160 Another approach is to consider \(\relationSet\) is not known beforehand~\parencitex{oie}. 161 In particular open information extraction (\textsc{oie}, Section~\ref{sec:relation extraction:oie}) directly uses surface forms as relation labels. 162 In this case, the elements of \(\relationSet\) are strings of words, not defined in advance, and even potentially not-finite. 163 We can see \textsc{oie} as a preliminary task to relation extraction: the set of surface forms can be mapped to a traditional closed-set of labels. 164 When \(\relationSet\) is not known beforehand, the relation extraction problem can be called \emph{open-domain relation discovery}. 165 This is the usual setup for unsupervised relation extraction described in Section~\ref{sec:relation extraction:unsupervised}. 166 167 \subsubsection{Directionality and Ontology} 168 \label{sec:relation extraction:directionality} 169 Most relations \(r\) are not symmetric (\(r\neq\breve{r}\)). 170 There are several different approaches to handle this asymmetry. 171 In the SemEval 2010 Task 8 dataset (Section~\ref{sec:datasets:semeval}), the first entity in the sentence is always tagged \(e_1\), and the second is always tagged \(e_2\). 172 The relation set \(\relationSet\) is closed under the converse operation~\parencite{semeval2010task8}: 173 \begin{equation*} 174 \forall r\in\relationSet: \breve{r}\in\relationSet. 175 \end{equation*} 176 This is the most common setup. 177 In this case, the relation labels incorporate the directionality; for example, the SemEval dataset contains both \(\textsl{cause--effect}(e_1, e_2)\) and \(\textsl{cause--effect}(e_2, e_1)\) depending on whether the first entity appearing in the sentence is the cause or the effect. 178 This means that given a \(r\in\relationSet\) in the SemEval dataset, we can easily query the corresponding \(\breve{r}\). 179 On the other hand, the relation set of the FewRel dataset (Section~\ref{sec:datasets:fewrel}) is not closed under the converse operation~\parencitex{fewrel}. 180 Furthermore, it is a mono-relation dataset without \textsl{other}. 181 This means that all samples \((s, e_1, e_2)\in\dataSet\) convey a relation between \(e_1\) and \(e_2\). 182 Naturally, in this case, the entity tagged \(e_2\) may appear before the one tagged \(e_1\). 183 And indeed, for relations that do not have their converse in \(\relationSet\), the same sentence \(s\) with the tags reversed may not appear in the FewRel dataset since this would need to be categorized as \(\breve{r}\not\in\relationSet\). 184 185 In general, the order of \(e_1\) and \(e_2\) is not fixed. 186 This is particularly true in the open-domain relation setup, when \(\relationSet\) being unknown, can not be equipped with the converse operation. 187 In this case, it is common to feed the samples in both arrangements: with the first entity tagged \(e_1\) and the second \(e_2\), and the reverse: with the first entity tagged \(e_2\) and the second \(e_1\). 188 This can be seen as a basic data augmentation technique. 189 190 More generally, the relation set \(\relationSet\) might possess a structure called a \emph{relation ontology}. 191 This is especially true when \(\relationSet\) comes from a knowledge base such as Wikidata~\parencite{wikidata}. 192 In this case, \(\relationSet\) can be equipped with several operations other than the converse one. 193 For example, Wikidata endows \(\relationSet\) with a subset operation, the relation \textsl{parent organization} \wdrel{749} is recorded as a subset of \textsl{part of} \wdrel{361}, such that \(\sfTripletHolds{e_1}{parent organization}{e_2} \implies \sfTripletHolds{e_1}{part of}{e_2}\), or using the notation of Section~\ref{sec:context:relation algebra}: \(\textsl{parent organization} \relationOr \textsl{part of} = \textsl{part of}\). 194 195 \subsection{Nature of Entities} 196 \label{sec:relation extraction:entity} 197 The approach to tackle the relation extraction task also quite heavily depends on the nature of entities. 198 In particular, an important distinction must be made on whether the \emph{unique referent assumption} is postulated. 199 This has been the case in most examples given thus far. 200 For instance, ``Alan Turing'' designates a single human being, even if several people share this name; we only designate one of them with the entity \wdent{7251} ``Alan Turing.'' 201 However, this is not always the case, for example, in the following sample from the SemEval 2010 Task 8 dataset: 202 \begin{marginparagraph} 203 SemEval 2010 Task 8 is one of those datasets without entity linking, which is rather common when dealing with non-unique referents. 204 \end{marginparagraph} 205 \begin{indentedexample} 206 The \uhead{key} was in a \utail{chest}.\\ 207 Relation: \(\textsl{content--container}(e_1, e_2)\) 208 \end{indentedexample} 209 In this case, the entities ``key'' and ``chest'' do not always refer to the same object. 210 The relation holds in the small world described by this sentence, but it does not always hold for every object designated by ``key''. 211 This is closely related to the fineness of entity linking. 212 Indeed, one could link the surface form ``key'' above with an entity designating this specific key, but this is not always the case, as exemplified by the SemEval 2010 Task 8 dataset. 213 This distinction is pertinent to the relation extraction task, especially in the aggregate setting. 214 When applied to entities with a unique referent, the \(\textsl{content--container}(e_1, e_2)\) relation is \(N\to 1\) or at least transitive. 215 However, when the unique referent assumption is false, this relation is not \(N\to 1\) anymore since several ``key'' entities can refer to different objects located in different containers. 216 217 \begin{marginparagraph}[-14mm] 218 The aggregate setup is not necessarily contradictory with the unique referent assumption. 219 Even though not all ``keys'' are in a ``chest,'' this fact still gives us some information about ``keys,'' in particular they can be in a ``chest,'' which is not the case of all entities. 220 \end{marginparagraph} 221 The unique referent assumption is not binary; the distinction is quite fuzzy in most cases. 222 Should the entity \wdent{142} ``France'' refers both to the modern country and to the twelfth-century kingdom? 223 What about the West Frankish Kingdom? 224 How should we draw the distinction? 225 Instead of categorizing the model on whether they take the unique referent assumption for granted, we should instead look at their capacity to capture the kind of relationship between a key and a chest as conveyed by the above sample. 226 227 \begin{marginparagraph} 228 More generally, all the usual properties of grammatical nouns can lead to variations of the relation extraction task. 229 For example, many models focus on rigid designators such as ``Lucius Junius Brutus'' which are opposed to flaccid designators such as ``founder of the Roman Republic.'' 230 Both refer to the same person \wdent{223440}. 231 However, it is possible to imagine a world where the ``founder of the Roman Republic'' does not refer to \wdent{223440}. 232 On the contrary, if \wdent{223440} exists, ``Lucius Junius Brutus'' ought to refer to him. 233 \end{marginparagraph} 234 235 Finally, another variation of the definition of entities commonly encountered in relation extraction comes from coreference resolution. 236 Some datasets resolve pronouns such that in the sentence ``\uent{She} died in Marylebone,'' the word ``she'' can be considered an entity linked to \wdent{7259} ``Ada Lovelace'' if the context in which the sentence appears supports this. 237 In this case, the surface form of the entity gives little information about the nature of the entity. 238 This can be problematic for models relying too heavily on entities' surface forms. 239 In particular, early relation extraction models did not have access to entity identifiers; at the time, pronoun entities were avoided altogether.